Succinct colored de Bruijn graphs
نویسندگان
چکیده
Motivation In 2012, Iqbal et al. introduced the colored de Bruijn graph, a variant of the classic de Bruijn graph, which is aimed at 'detecting and genotyping simple and complex genetic variants in an individual or population'. Because they are intended to be applied to massive population level data, it is essential that the graphs be represented efficiently. Unfortunately, current succinct de Bruijn graph representations are not directly applicable to the colored de Bruijn graph, which requires additional information to be succinctly encoded as well as support for non-standard traversal operations. Results Our data structure dramatically reduces the amount of memory required to store and use the colored de Bruijn graph, with some penalty to runtime, allowing it to be applied in much larger and more ambitious sequence projects than was previously possible. Availability and Implementation https://github.com/cosmo-team/cosmo/tree/VARI. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
Rainbowfish: A Succinct Colored de Bruijn Graph Representation
The colored de Bruijn graph – a variant of the de Bruijn graph which associates each edge (i.e., k-mer) with some set of colors – is an increasingly important combinatorial structure in computational biology. Iqbal et al. demonstrated the utility of this structure for representing and assembling a collection (population) of genomes, and showed how it can be used to accurately detect genetic var...
متن کاملDisentangled Long-Read De Bruijn Graphs via Optical Maps
While long reads produced by third-generation sequencing technology from, e.g, Pacific Biosciences have been shown to increase the quality of draft genomes in repetitive regions, fundamental computational challenges remain in overcoming their high error rate and assembling them efficiently. In this paper we show that the de Bruijn graph built on the long reads can be efficiently and substantial...
متن کاملOn k-colored Lambda Terms and Their Skeletons
The paper describes an application of logic programming to the modeling of difficult combinatorial properties of lambda terms, with focus on the class of simply typed terms. Lambda terms in de Bruijn notation are Motzkin trees (also called binary-unary trees) with indices at their leaves counting up on the path to the root the steps to their lambda binder. As a generalization of affine lambda t...
متن کاملThe Collatz conjecture and De Bruijn graphs
We study variants of the well-known Collatz graph, by considering the action of the 3n+ 1 function on congruence classes. For moduli equal to powers of 2, these graphs are shown to be isomorphic to binary De Bruijn graphs. Unlike the Collatz graph, these graphs are very structured, and have several interesting properties. We then look at a natural generalization of these finite graphs to the 2-...
متن کاملOn the recognition of de Bruijn graphs and their induced subgraphs
The directed de Bruijn graphs appear often as models in computer science, because of the useful properties these graphs have. Similarly, the induced subgraphs of these graphs have applications related to the sequencing of DNA chains. In this paper, we show that the directed de Bruijn graphs can be recognized in polynomial time. We also show that it is possible to recognize in polynomial time wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 33 20 شماره
صفحات -
تاریخ انتشار 2017